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ABSTRACT 

A new parallel pipelined feed forward architecture for real-time signal is proposed. A hardware oriented radix-2 
algorithm is derived by integrating a twiddle factor decomposition technique in the divide and conquer approach. 
The butterfly structure of radix-2 algorithm is modified in accordance with the flow of signal. The new butterfly structures 
are designed to handle the hybrid data path which consists of real & complex data paths. The proposed approach which can 
be extended to all radix-2 k based DITFFT & DIFFFT algorithms. The zero frequency is computed without processing the 
zero input data. The symmetry property is applicable to minimize the stage computation in half of the actual stage. 
The proposed radix-2 k feed forward architectures need to use fewer hardware resources in hardware architecture. 
The proposed radix 2 k architectures lead to low hardware complexity with respect to adders and delays. The N-point 
4-parallel radix-2 2 architecture requires 4(logi 6 N-l) complex multipliers, log 2 N real adders and N-l complex delay 
elements. 
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INTRODUCTION 

The Fast Fourier Transform (FFT) is an essential algorithm in digital signal processing. It is employed in various 
applications such as radar, wireless communication, medical imaging, spectral analysis, and acoustics. Fast Fourier 
Transforms have been implemented on different platforms, ranging from general purpose processors to specially designed 
computer chips. There are two main types of pipelined architectures: feedback and feed forward. On the one hand, 
feedback architectures are characterized by their feedback loops, i.e., some outputs of the butterflies are fed back to the 
memories at the same stage. 

Feedback architectures can be divided into two main types of pipelined architectures: feedback and feed forward. 
On the one hand, feedback architectures are characterized by their feedback loops, i.e., some outputs of the butterflies are 
fed back to the memories at the same stage. Feedback architectures can be divided into single-path delay feedback which 
process a continuous flow of one sample per clock cycle, and multi-path delay feedback or parallel feedback which process 
several samples in parallel. There has been an increasing interest in the computation of FFT for real valued signals, since 
virtually most of the physical signals are real. The real valued signals which are of prime importance in real-time signal 
processing exhibit conjugate symmetry giving rise to redundancies. This property could be exploited to reduce both 
arithmetic and memory complexities. The RFFT plays an important role in different fields such as communication systems, 
biomedical applications, sensors and radar signal processing. 
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A low complexity implementation of RFFT can reduce the power consumption in implantable or portable devices. 
Even though specific algorithms for the computation of the RFFT have been proposed in the past, these algorithms lack 
regular geometries to design pipelined architectures. A novel approach to design efficient pipelined architectures for 
RFFT was presented for the first time in. This architecture is obtained by modifying the radix-2 flow graph to achieve real 
data paths and processes 4 samples in parallel. 

This approach is specific to radix-2 algorithm and was limited to a 4-parallelarchitecture. A general approach 
which can be extended to other radix-2" algorithms is needed to take advantage of less number of multiplication operations 
in higher radix algorithms. 

RADIX-2 2 DIF FFT ALGORITHM 

By the observations made in last section the most desirable hardware oriented algorithm will be that it has the 
same number of non-trivial multiplications at the same positions in the SFG as of radix-4 algorithms, but has the same 
butterfly structure as that of radix-2 algorithms. The clear derivation of the algorithm in DIF form with perception of 
reducing the hardware requirement in the context pipeline FFT processor is, however, yet to be developed. 

The DFT of size N is defined by 

X p = '^p, pe{o ( i ( ..,Jv-i} 

(l) 

W = 

then 

Ai-l 

K p = W"* y pe{0.1 JV-1} 

r,=U (2) 

Where W N denotes the N th primitive root of unity, with its exponent evaluated modulo N. To make the derivation 
of the new algorithm clearer, consider the first 2 steps of decomposition in the radix-2 DIF FFT together. 

_ | j 

(3) 

The key idea of the new algorithm is to proceed the second step decomposition to the remaining DFT coefficients, 

fN \ 
( T n 2+ n 3 

including the "twiddle factor" Wjz to exploit the exceptional values in multiplication before the next butterfly is 

constructed. Decomposing the composite twiddle factor and observe that 
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Figure 1: Different Stages of Computation 

Radix-2 2 algorithm has the feature that it has the same multiplicative complexity as radix-4 algorithms, but still 
retains the radix-2 butterfly structures. The multiplicative operations are in a such an arrangement that only every other 
stage has non-trivial multiplications. This is a great structural advantage over other algorithms when pipeline/cascade 
FFT architecture is under consideration. 




Figure 2: Feedforward Structure Radix-2 2 DIF FFT 

In the first step, the FFT flow graph is modified by removing the redundant samples. Most of the approaches in 
literature compute the frequencies with indexes k=[0, N/2]. Figure 2 shows the flow graph of 16-point FFT decimated in 
frequency. The boxed regions show the redundant samples which can be removed from the flow graph. 

These samples and the corresponding computations can be removed from the flow graph. The flow graph after 
removing the redundant samples will be irregular. This flow graph is suitable to implement efficient in-place architectures, 
but not for efficient pipelined architectures. 

Figure 3 shows the modified flow graph which is obtained by rearranging the imaginary operations in place of 
redundant operations. All the edges in Figure 3 are real, i.e., the data have been separated into real and imaginary parts. 

The continuous edges compute the real parts and the broken edges represent the computations of the imaginary 
parts. The output samples of the flow graph include a letter (r or i) to indicate if the value corresponds to the real part or the 
imaginary part of the output. 

Further, in higher point FFTs, a complex sample (real and imaginary components computed separately) may need 
to be multiplied by a complex twiddle factor, i.e., a full complex multiplier is required. 
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Process two complex samples but 
needs only addition operation 
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Figure 3: Adapted Flow Graph of a Radix-2 2 DIF FFT 

In that situation, the proposed modifications will not lead to a regular flow graph. This problem can be solved by 
reordering the data before the multiplier to have corresponding samples at the input of the multiplier at the same time. 
At the architecture level, this problem can be solved by introducing an extra stage of reordering circuit before the 
multiplier. We modify the multiplier stage similar to that of the butterfly stage by adding a reordering circuit. 
This will increase the latency by a few cycles and the number of delay elements depending on the stage at which 
multipliers are required. 

PROPOSED ARCHITECTURES 

In this section, we present 2-parallel and 4-parallel architectures for real FFT computation based on radix-2 3 and 
radix-2 4 algorithms using the proposed methodology. 

2-Parallel Architecture 

Figure 4 shows a general N-point 2-parallel architecture based on the radix-2 3 algorithm, where N is a power of 8. 
The bottom part will be repeated log 8 N-2 times. In general, for an N-point RFFT, with N power of 2, the 2-parallel 
architecture requires 21og 2 N real adders, log 8 N-l complex multipliers, 2 (log 8 N-l) CSD(W 8 64 ) multipliers and 
3N/2-2 real delay elements. Few extra delays are required for interchanging real and imaginary components before twiddle 
factor multiplications. 

4-Parallel Architecture 

Figure 5 shows a general N-point 4-parallel architecture based on radix-2 3 algorithm, where N is a power of 8. 
The bottom part will be repeated times. In a general case of N-point RFFT, the 4-parallel architecture requires 41og 2 N real 
adders, 2 (log 8 N-l) complex multipliers, 4 log 8 N-5 CSD W 8 multipliers and 7N/4-4 delay elements. 
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Figure 4: Proposed 2-Parallel Architecture 
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Figure 5: Proposed 4-Parallel Architecture 



SYNTHESIS RESULTS 



The architectures depends on the required number of multipliers, adders, delay elements. The performance is 
represented by throughput. The proposed designs are the only specific approaches for the computation of RFFT. 
The symmetric property in proposed method is used to reduce the hardware requirement. 

The proposed architectures are feed-forward. The pipelining stages are added as necessary to increase the 
frequency of operation. Much parallel pipelined architecture to compute FFT with complex inputs have been proposed in 
the literature based on radix- 2" algorithms. 




Algorithm 


Radix-2 2 


FFT Size 


16 


Frequency 


200MHZ 


Area 


2132um 2 


Power 


5.2mW 
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Figure 7 



COMPARISON OF FFT 



The comparison of Fast Fourier Transform & Real Value Signal Fast Fourier Transform can be done for knowing 
the computation time. And also we can improvise the concept by changing some parameters. 




CONCLUSIONS 



This parallel pipelined feed forward architecture paper has presented to design efficient architectures for the 
computation of RFFT. The computation can be completed by limiting the signal flow graph diagram into real & imaginary 
paths. This approach is mainly used to effective utilization of memory and to avoid reusability. The approach can be 
implemented for higher radix algorithms. Based on the modified flow graph and hybrid data path design using radix-2 3 and 
radix-2 4 algorithms, novel 2-parallel and 4-parallel pipelined architectures are developed. This approach can also be 
designed for decimation-in-time (DIT) algorithms. The same approach is applicable for complex FFT, the DIT 
architectures have need of less delay elements than DIF architectures. 
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